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Abstract 

We introduce and analyze a framework for function interpolation using compressed sensing. 
This framework - which is based on weighted minimization - does not require a priori bounds 
on the expansion tail in either its implementation or its theoretical guarantees, and in the absence 
of noise leads to genuinely interpolatory approximations. We also establish a new recovery 
guarantee for compressed sensing with weighted minimization based on this framework. This 
guarantee has several key benefits. First, unlike existing results, it is sharp (up to constants 
and log factors) for large classes of functions regardless of the choice of weights. Second, by 
examining the measurement condition in the recovery guarantee, we are able to suggest a good 
overall strategy for selecting the weights. In particular, when applied to the important case 
of multivariate approximation with orthogonal polynomials, this weighting strategy leads to 
provably optimal estimates on the number of measurements required, whenever the support set 
of the significant coefficients is a so-called lower set. Finally, this guarantee can also be used 
to theoretically confirm the benefits of alternative weighting strategies where the weights are 
chosen based on prior support information. This provides a theoretical basis for a number of 
recent numerical studies showing the effectiveness of such approaches. 


1 Introduction 

Many problems in science and engineering require the approximation of smooth, multivariate func¬ 
tions from finitely-many pointwise samples. Although a classical problem of approximation theory, 
recently there has been a renewed focus in this area, driven in part by applications in uncertainty 
quantihcation. Problems in this area are typically high dimensional and place severe limitations 
on the number of measurements that can be acquired. At the same time, developments in the 
field of compressed sensing (CS) have shown that it is often possible to recover high-dimensional 
vectors possessing certain low-dimensional structures from substantially reduced sets of linear mea¬ 
surements PES]. It is known that smooth, high-dimensional functions have approximately sparse 
expansions in certain orthogonal systems (e.g. tensor Chebyshev or Legendre polynomials). Hence, 
in recent years there has been an increasing focus on applying the theory and techniques of CS to 
accurately compute such expansions [ITlEiETlEalMlEaEiETlIiailS]. 

However, the application of CS to function approximation raises several issues. First, stan¬ 
dard CS concerns itself primarily with the recovery of sparse vectors in finite-dimensional vector 
spaces. Functions, on the other hand, live in infinite-dimensional spaces. Whilst they may be well 
approximated by finite sums in certain orthogonal polynomial systems, their expansion in a such 
system is typically infinite. As one might expect, this mismatch presents a number of key practical 
and theoretical issues. Second, the coefficients of a function in a polynomial basis are not just 
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sparse, but tend to possess additional structure. This raises questions about how to exploit such 
structure in the reconstruction process, and the expected theoretical benefits of doing so. Third, 
it is well-known that naive approximations of high-dimensional functions suffer from the curse of 
dimensionality. Therefore, a question of singular interest is whether, and the extent to which, CS 
provides a means to avoid this crucial issue. 

In this paper we introduce a framework and series of recovery guarantees for the use of CS 
in approximating multivariate functions from limited numbers of pointwise samples. Unlike most 
existing approaches to this problem - which can loosely be described as discretizing first - in our 
framework the recovery problem is first formulated as a (weighted) minimization problem in an 
infinite-dimensional space, and then discretized. This brings a number of benefits. First, our tech¬ 
niques do not require a priori estimates for the expansion tail, as is common in other approaches. 
Second, in the absence of noise, our techniques lead to exactly interpolating approximations; a de¬ 
sirable property in general as well as for certain applications. Third, much like our techniques, our 
theoretical results do not assume a priori estimates for the expansion tail, as has been necessary in 
previous works. Since infinite expansions are handled faithfully, continuing a line of investigation 
initiated in [2], we refer to this framework as infinite-dimensional compressed sensing. 

In order to exploit the structure of polynomial coefficients of smooth functions we use a weighted 
minimization approach. Incorporating weights into the reconstruction problem has received 
some attention of late, especially for polynomial approximations, due to its potential for enhancing 
accuracy; see mEaiis], as well as m for some theoretical analysis. In this paper, we introduce a 
new recovery guarantee for CS for very general choices of optimization weights. For the special case 
of unweighted minimization, our guarantees reduce to those introduced previously in |23l [Ml H!] 
(although our theorems avoid the issues surrounding infinite expansions mentioned above). For 
weighted minimization, a corollary of our main result yields guarantees similar to those in |37j . 
As we demonstrate, however, the guarantees of 133 are not sharp for a large class of functions and 
weights. Fortunately, by returning to our abstract result we derive an improved recovery guarantee 
which is sharp for this class. 

We next use our main result to identify a good overall weighting strategy, in the sense that 
minimizes the number of measurements required in our recovery guarantee. Similarly to recent work 
m, we show that this choice of weights yields a reconstruction procedure for tensor Chebyshev and 
Legendre polynomial expansions that mitigates the curse of dimensionality to a substantial extent. 
This is on the proviso that the coefficients of the function being approximated possess a particular 
type of structured sparsity defined by so-called lower sets, which is a reasonable assumption in 
applications of interest (see millll la II511291I30] and references therein). Finally, we apply our 
abstract recovery guarantee to assess an alternative strategy where the weights are chosen based 
on prior support information; a strategy which has been advocated in a number of recent works 
|33[ H5] . When the weights our chosen in this way, our recovery guarantee provides a theoretical 
basis for the empirically-observed benefits of this approach. 

The outline of the remainder of this paper is as follows. In ^we present relevant background 
material on CS for function approximation, and give an overview of our main contributions in 
this paper. We give some preliminary notation in ^ In ^we introduce the infinite-dimensional 
weighted minimization problem, and in ^we present the main examples that will be used to 
demonstrate the our results. Our main abstract recovery guarantee is presented in ^ and in ^ 
we discuss various consequences of it, including its application to high-dimensional approximation 
using polynomials. Finally, in ^we present the proof of our main result. 
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2 Background and overview 


In this section, we first give an overview previous work on CS for function approximation and then 
present a summary of the main results of this paper. 


2.1 Previous work 

In recent years, numerous works have sought to apply the techniques of CS to multivariate function 
approximation. Theoretical guarantees for recovery of sparse polynomial expansions via unweighted 
minimization were first presented in |36j for the univariate case and |44) for the multivariate 
case. Weighted minimization was proposed in [33l 05] , and theoretical results based on weighted 
sparsity were presented in m- For theoretical results on lower sets related to this paper, see |I2j . 
Applications of CS techniques to uncertainty quantification have been pursued numerous works, 
including [m [271 [331 05] and references therein. Besides random sampling (from appropriate 
continuous measures) several works have also proposed new sampling strategies for CS that aim 
to improve reconstruction quality. These include coherence-based sampling [23], preconditioning 
|241 l3T] , deterministic sampling |43j and subsampling from deterministic Gaussian quadratures [39] . 
Outside of CS theory, worst-case recovery guarantees for weighted minimization for deterministic 
sampling were presented in [T] , demonstrating near-optimal performance for general scattered data. 


2.2 Compressive function approximation 

Let an orthonormal basis of functions (e.g. polynomials) and consider a function / = 

Ylien Suppose that is a finite set of points, typically chosen randomly from an ap¬ 

propriate distribution, and consider the measurements y = {/(ti)}lTi- In order to approximate 
/, it suffices to approximate its coefficients x from the measurements y. However, x is typically 
an infinite vector, meaning that some sort of discretization is required. In a majority of previous 
works, this discretization is performed first; that is, prior to formulating the optimization prob¬ 
lem. Specifically, one introduces a fixed N > m and seeks to approximate the first N coefficients 
xi,..., xn of X by solving the following inequality-constrained (weighted) minimization problem: 


min ||z||i w subject to \\Az — y|| < 5. (2-1) 


Here d > 0, ||z||i,.u, = f’Cnorm on with weights Wi > 0 (we discuss the 

issue of weights next) and A = {0j(ti)}i^’i j=i ^ . The parameter S is an artefact of the 

discretization, and is chosen so that the exact coefficients are feasible for (2.1), i.e. 


N 

f 

i=l 


< s. 

L°° 


( 2 . 2 ) 


If X is a minimizer of ( |2.l[ ), then one defines the corresponding approximation to / as / = Xi(pi. 

As discussed in [T], the error committed by the approximation / will depend on the choice 
of 6. Hence a good estimation of the norm of the expansion tail is important to ensure accurate 
results [l5]. Herein lies a problem. In general, this tail error is unknown. Whilst techniques such 
as cross validation [13 ESI 03] have been used to provide practical estimations for 6, these are both 
computationally expensive and wasteful in terms of the data. A key element of the framework we 
develop in this paper is that it does not require knowledge of 6, and thus allows one to avoid this 
estimation step. 
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In addi tion to this practical issue, from a theoretical perspective all existing CS recovery guar¬ 
antees for (2.1) (see da [la Ea EH [Ml Ea E!) assume a priori knowledge of ||/ — 

Whilst this is not a problem when / is itself a polynomial of known degree N, such guarantees 
are less informative about the approximation capabilities of weighted minimization for objects 
with infinite expansions, i.e. functions. The theoretical guarantees we present in this paper seek to 
overcome this limitation. 


2.3 Weighted minimization 

In a number of recent works it has been observed empirically that unweighted minimization (i.e. 
with Wi = 1, Vi) often gives relatively poor approximations, and that better results are possible 
if slowly growing weights Wi are introduced [Ml ISZl US]. For deterministic samples this 

was explained recently in [T]- Therein it was shown that unweighted minimization is in general 
unsuitable for function approximation, since it suffers from a so-called aliasing phenomenon stem¬ 
ming from the infinite-dimensionality of the problem. In the unweighted case, there are solutions 
X = to the optimization problem with nonzero coefficients far out in the expansion tail. The 

corresponding approximation / = X^ieN data, but oscillates rapidly in between the data 

points, due to nonzero high-frequency modes. The introduction of slowly growing weights removes 
this phenomenon, however, since high-frequency modes are penalized by increasing weights. 

The focus of this paper is on samples drawn randomly from an appropriate distribution. 

Unlike for deterministic samples, approximations computed by solving the unweighted minimiza¬ 
tion problem do converge (with high probability) as the number of samples increases. Yet adding 
weights generally leads to a more rapidly-decreasing approximation error in this setting EaiiaEZ]; 
see Fig. [^for an illustration. A central contribution of this paper is to understand and quantify this 
benefit theoretically. Our basis for this is a new recovery guarantee for weighted minimization. 


2.4 Recovery guarantees 

Standard CS theory states that one can recover a vector x of sparsity s, i.e. 


s = |A| = ^l, A = {i:xi/0}, 
ieA 


(2.3) 


using, up to log factors, m k. s appropriately-chosen measurements, regardless of the locations of 
the nonzero entries of x. In practice, this can be achieved by solving an minimization problem. 
When considering weighted minimization on the other hand, it was proposed in m to replace 


sparsity (2.3) by the following weighted sparsity measure 


s = |A|^ = 

ieA 


A = {i : Xi^ 0}, 


corresponding to the optimization weights w = The work of (ST] has established a mea¬ 

surement condition of the form 

m « |A|^ X log factors, (2-4) 

for weighted minimization with appropriate measurements. 

Unfortunately, such guarantees do not explain the observed empirical performance of weighted 
minimization in some important cases. For example, suppose that A = {1,...,M}. That is, x 
is nonzero in its first M entries, or more generally (for inexact sparsity), the largest M entries of 
X are also its first M entries. This phenomenon is common in polynomial expansions of smooth 
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cc, f{t) =cos(36\/2t + l/3) 



Lc, m = ^ 



LU, f{t) = \/1.05 + t 


Figure 1: Top row: The error ||/ — /||loo (averaged over 50 trials) against m for Chebyshev (C) or Legendre 
(L) polynomials with points drawn from the Chebyshev (C) or uniform (U) measure. Here / = 
where i is a solution of ( |4.2[ ) and Ik = {0, ■ ■ ■, K}. The parameter K = 1000 was used and the weights were 
taken to be Wi = {i + 1)“ for various a > 0. Bottom row: Chebyshev or Legendre coefficients of the function 
f{t). In this and all other experiments in this paper, we use the SPGLl package [U 02] with a maximum 
of 100,000 iterations. 


functions, especially in lower-dimensional settings. Herein coefficients tend to exhibit decay (see 
Fig. 0: with the largest M coefficients often coinciding with the first M, or more generally, the 
first 0{M). A simple example of this is an oscillatory function, for which the coefficients Xi are 
O (1) up to a certain resolution criterion, after which they are numerically zero (see Fig. [^. 

are taken to be polynomially growing with 


Suppose for simplicity that the weights Wi = i" 


index a. Then the recovery guarantee (2.4) reduces to 

X log factors. 


m 


(2.5) 


In other words, more rapidly-growing weights seemingly require more measurements to recover x. 
However, this conclusion is at odds with numerical results shown in Fig. wherein it is observed 
that increasing the weights in fact decreases the error to a moderate degree, and certainly does not 
worsen it. This example is representative of quite general behaviour of weighted minimization. 


Fortunately, it transpires that (2.5) is not sharp. A corollary of our main result gives an estimate 


for this example which is both independent of the weights, in certain cases, and provably sharp. 


2.5 Contributions 


Our main contribution is a general result on inhnite-dimensional CS for function interpolation based 
on weighted minimization. A simplified version (see Remark 2.3) of our main result is as follows: 


Theorem 2.1. Let 0<e<e D (Z be a domain with a probability measure v, be an 

orthonormal system in L1{D) with Ui = < oo for all i gN, w = {tCijigN, Wi > 0 be weights 
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and f = ^ ~ K £ N and A C {1,... ,K} be a subset with 

minjg{i^ > 1 and suppose that ti,... ,tm are drawn independently from the measure u, 

where 

'm>[\/S\u+ sup {«^/u;,^}|A|^ ] • log(e“^) • logdAI^K). (2.6) 

\ ie{i,...,i^}\A J 

Then, by solving a weighted minimization problem of size m x K with weights w, it is possible 
to approximate x (and therefore f) from the data y = {/(ti)}™i up to an error proportional to 


'^Wi\Xi\ + Tk,w{x), 
i^A 


with probability 1 — e, where Tk,w{x) is the truncation error (6.2). Furthermore, in the absence of 
noise the resulting approximation interpolates the data {/(ti)}™i- 


This result highlights the main contributions of this paper: 


1. Removal of a priori tail estimates. Theorem 2.1 unlike recovery guarantees based on (2.1), 


completely avoids the need for a priori knowledge of the expansion tail. In particular, it applies 
to arbitrary functions, not just finite expansions in the system This result of this is the 

additional truncation term Tk,w{x) in the error estimate. But, as we explain in ^ this term can 
be estimated and is typically negligible for large enough K. 


2. A weights-independent recovery guarantee. As in 12.4, let A = {1,...,M} and Wi = i 
reasonable to assume that Ui = O [i^) as i 


It is 


oo for some /3 > 0. If a > /3, then (2.6) reduces to 
m > • log(e-^) • log{M^^+^K). 


In contrast to (2.5), this condition is independent of the weights parameter a. In particular, for the 


recovery of Chebyshev polynomial expansions, where /3 = 0, it reduces to m = O (M) (up to log 
factors) and for Legendre polynomial expansions, in which case /3 = 1/2, we obtain m = O (M^). 
Both results are essentially sharp: 


Remark 2.2 Clearly the result for Chebyshev polynomials is sharp up to log factors, since it is 
linear in the number of unknowns M. In [T], using results from 13IM], it was shown that no robust 
method can recover the first M Legendre polynomial coefficients using asymptotically fewer than 
m X measurements when the samples are exactly equidistributed (as opposed to randomly 
drawn) according to the uniform measure. 


3. Identifying good weights. Log factors aside, the number of measurements stipulated by (2.6) is 
dependent on the factor 

|A|„+ sup {u‘f/wi}\A\io. 
i£{l,...,K}\A 


Note that the first term is independent of the optimization weights, and depends only on the the 
intrinsic weights u. Therefore, in the absence of any further assumptions on A (see below for this 
case), a good weighting strategy would seek to make the second term equal to the first. This can 
be achieved by setting 

Wi = Ui, Vi, (2.7) 
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which leads to the following recovery guarantee for minimization: 


m > |A|„ • log(e ^) • log(|A|„ii:). 

Note that, beside constants and log factors, this is strictly smaller than the recovery guarantee 


|A|„+ max {uJ|A| • log(e ) • log(|Aj^K), 
y 1=1,...,K j 

for unweighted (} minimization, and may be substantially smaller depending on the support of A. 
4. Overcoming the curse of dimensionality. In the case of polynomial approximations in high dimen¬ 


sions, the weighting strategy (2.7) leads to a substantially smaller recovery guarantee for certain 


structured sparse support sets A; specifically, so-called lower sets (see Definition 7.4). These sets 
are known to be good models for the support sets of polynomial coefficients in high dimensions (see 
dn EH [a da i29i i3oj and references therein). For example, with tensor Chebyshev polynomials 


and points drawn randomly from the tensor Chebyshev measure the recovery guarantee (2.6) yields 

m > 2'^s X log factors, if Wi = 1, Vi, 
where s = |A|. However, if A is also a lower set, one has 

^ > giog(3)/iog(2) ^ factors, if Wi = Ui, Mi. 


Log factors aside (see Remarks 7.8 and 7.9), the latter does not suffer from the curse of dimension¬ 
ality and agrees with the best known estimates for least-squares approximation in lower sets m- 
Similarly, for tensor Legendre polynomials with points drawn randomly from the uniform measure. 


the recovery guarantee (2.6) yields 

m > X log factors. 


if Wi = Ui, Mi, 


for lower sets A (this result is essentially sharp - recall Remark 2.2), whereas the unweighted 


guarantee grows exponentially with the dimension of the truncated polynomial space. We refer to 


17.3 for further details. 


5. Support estimation via weighted minimization. Since our recovery guarantee applies to any 
choices of weights Wi, it can be used to explain why the strategy of choosing weights based on a 
priori knowledge of part of the support set of the coefficients reduces the number of measurements 


required. Our main result in this setting, detailed in { 7.4, shows that if over half of the support set 
is correctly estimated, then such a weighting strategy leads to a strictly smaller recovery guarantee 
than that of the unweighted case. 


Remark 2.3 Theorem |2.1| makes several simplifications for illustrative purposes that are not nec- 

Lm can be drawn from a 


essary in our main result. Theorem |6.1[ First, the measurements ti,... ,0, 
different measure // than the orthogonality measure v of the functions Second, the mea¬ 

surements can be noisy. In our main result we also allow for indexing over arbitrary countable sets 
I, which is particularly useful in the case of multivariate polynomial approximations. 
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3 Preliminaries 


Throughout this paper, D C will be a domain and v will be an integrable nonnegative weight 
function on D. We write for the space of complex-valued weighted square-integrable 

functions on D, with norm ||•||£2 and inner product Given D and v, we let C 

L^{D) n L°°{D) be a set of functions that are orthonormal with respect z/, where I is an index set 
that is at most countable. For a function / G span{i;Aj : i G /} we let x = {x*} ig/ G i^{I) be its 
coefficients in the system {</)i}ie 7 , i.e. 

/ ~ ^ ^ Xi = {f, fpi)Li' 

i£l 

Our goal throughout this paper is to recover the coefficients x from a small number of pointwise 
evaluations of /. 

Several other pieces of standard notation will also be used. The space of square-summable 
sequences indexed over I will be denoted by with its norm and inner product given by H-H 

and (•, •) respectively. The set {cijig/ denotes the canonical basis of and, if A C /, we write 
Pa for the orthogonal projection onto spanjej : i G A}. Whenever convenient, we will also regard 
Pa for finite A as a mapping with range Similarly, we will consider a vector x G Ran(PA) 

interchangeably as a sequence x G supported on the set A and as a vector in 

3.1 Weighted spaces and sparsity 

For the remainder of this paper, w = {rcjjjg/ will be a set of positive weights used in the optimiza¬ 
tion problem. Define the space of weighted summable sequences by 



Note that we may write the norm as ||x||i,tu = ||lFx||i, where 

W = dia.g{wi,W 2 , .. (3.1) 

is the infinite diagonal matrix of weights and H-H^ is the standard unweighted norm. For a set 
A C / we also define its weighted cardinality by 

|A|^ = 

ieA 

If Wi = 1, Vi G I, then we merely write |A| for the corresponding unweighted cardinality of A. 

3.2 Sampling points and the operator U 

Given 12 and {4>i}i^i let ;U be a probability measure on D satisfying 

sup < 00 , Mi el, (3.2) 

t&D 


and define 


1, sup iy(t)/ 

teD 


Ui := max 


Vi G /. 


(3.3) 



Note that we do not require the sequence u = to be uniformly bounded in i. Given such 

a probability measure we assume from now on that the sampling points ti,... ,tm are drawn 
independently from 

With ly, fj, and in hand, we define the infinite matrix U as follows: 


r m,oo 

r J z=i,j=i 


(3.4) 


We shall view U interchangeably as an infinite matrix and also an operator. The following lemma 
identifies an appropriate domain for U to be bounded: 


Lemma 3.1. Let u = {ui}i^i and U be as in (3.3) and (3.4) respectively. Then the operator 
U : £u{I) — )> C™ is bounded. 

Proof. Let x G Then \{Ux)i\ = \xj\\4>j{ti)\^y u{ti) / fj.{ti) < J2jeiUj\xj\ = ||x||i,„. □ 


4 Infinite-dimensional weighted minimization 


In this section, we introduce the infinite-dimensional weighted minimization formulation that 
will be used throughout this paper. 

Given a function / and sampling points {ti}'(Li, the measurements will be of the form 

f{ti) + ei, i = l,...,m, 

where e* are noise terms satisfying the weighted estimate 

v{ti) 


E 


, |2 ^ 2 
le^l < Tj , 


for some known noise parameter p. Define the scaled noise vector 

e = ei = y^iy{ti)/fi{ti)ei, ||e|| < ??, 

and note that the measurements y = can be expressed as 

y = Ux + e, yi = \^\/v{ti)/yi{ti)f{ti) + , 


where U is as in (3.4). Suppose now that w = {tCijie/ are weights. As in [T], we consider the 
optimization problem 

inf ||z||i ^ subject to ||t/z — y|| < r/. (4.1) 

z&iUl) 

Note that in the absence of noise, i.e. r/ = 0, solutions of the problem exactly interpolate the 
function / at the points ti,..., t^. Moreover, unlike (2.1) there is no need to know bounds for the 
expansion tail in order to formulate (4.1). 

Unfortunately, this problem cannot be solved numerically, since it involves minimizing over an 
infinite-dimensional space. To overcome this, we need to truncate (4.1) in such a way so that we 
retain the important properties of (4.1) noted above. We do this as follows. For K = 1,2,... let 
Ik T I he a. subset of I of finite cardinality and let Pk = Pij^ denote the projection onto Ik- We 
shall assume that the sequence of projections converges strongly to the identity operator 

on i‘^{I), i.e. 

Pkx —)■ X, Vx G £'^{I)- 
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Note that we do not require the sets Ik to be nested, although this will often be the case in practice. 
Given the subsets Ik, we now replace (4.1) with the following problem: 


mm 

zePKieiii)) 


subject to \\UPkz - y\\ < rj. 


(4.2) 


This problem is equivalent to a minimization problem on C^, and therefore numerically solvable. 
In particular, the space Pk{Iw{I)) is isomorphic to and UPk is equivalent to an mx K matrix 
formed by the columns of U with indices in Ik- Importantly, however, (4.2) retains the key features 


of (4.1 


. Namely, there is no need to know the expansion tail, and in the absence of noise solutions 


of (4.2) interpolate / exactly at the data points. 


A key issue for (4.2) is the choice of the truncation parameter K. Loosely speaking, K should 


be taken sufficiently large such that the additional error induced by solving (4.2) instead of (4.1) is 


small. It is important, however, that this criterion be independent of the function / to approximate, 
i.e. it should not involve the expansion tail Yliei\iK since a priori estimates for this term are 
generally unknown (recall ^2.2[). Fortunately, as we demonstrate in ^ this is indeed possible. 


5 Main examples: tensor Chebyshev and Legendre polynomials 


To illustrate the main results of this paper, we consider the case of tensor products of Chebyshev and 
Legendre polynomials on the unit hypercube D = (—1,1)'^. Recall that one-dimensional Legendre 
and Chebyshev polynomials are orthogonal with respect to the measures 

u{t) = 1/2 and u{t) = — ^ , t E (-1,1), 

TTV 1 — t 

respectively. In the hypercube, the corresponding orthogonality measures are 

1 

v{t)=2-‘^ and = t = (ti,...,td) E (-1,1)'^. 

vr(I - tj)V2 

If • • • are the univariate polynomials of Chebyshev or Legendre type, we define the multi¬ 

variate system via tensor products: 


t = {ti,...,td) e (-1,1)'', i = {ii,---,id) e K- 
i=i 

Within this setup, we shall address the following three specific sampling scenarios, all of which are 
permissible under the condition (3.2): 


Tensor Chebyshev polynomials, random sampling from the Chebyshev measure. In this case, the 
measures are given by ^{t) = ia{t) = 0^=1 • Since univariate Chebyshev polynomials 

satisfy |(/>o(t)| = 1 and supjg(_;^ otherwise we find that 


Ui = sup|(/>i(t)| = 2l*lo/^, 
teD 


(5.1) 


for this example, where |z|o = |{/ : ij / 0}| for i = (ii ,... ,id) E Nq. 
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Tensor Legendre polynomials, random sampling from the uniform measure. In this case, v[t) = 
pL{t) = 2“'^. Since the nnivariate Legendre polynomial satisfies sup^g(_;^ + 1, i G No, 

it follows that (3.2) holds with 


Ui= snp \(j)i{t)\ = IT y^2ij + 1. 

j=i 


(5.2) 


Tensor Legendre polynomials, random sampling from the Chebyshev measure. In this case, i'{t) = 


2 ^ and /i(t) = 0^=1 


. Hence 


Ui = 


max 1, sup (7r/2)^/2 JJ(1 - t]f^^\(t)i^{tj)\ I . 
m-uy j=i 


It is known that univariate Legendre polynomials satisfy (1 — < 1 and 

\(fi{t)\{l-t^f^^ <2/y/TT, tE[-l,l], f E N, 


(see Remark 5.1). Therefore we have 


i E <. 


(5.3) 


(5.4) 


Unlike the previous cases this bound is not exact for each i. However, it is known that the constant 
in (5.3) cannot be improved. Hence the bound (5.4) is sharp when taken over all i E Nq. 

Remark 5.1 Let . ■ • be the classical univariate Legendre polynomials, i.e. with normaliza¬ 

tion Pi{l) = 1, so that (/>i(t) = y/2i + lPi{t). These polynomials satisfy the following inequality [3] 
(see also [26] and [2Q|) 


/o \ 1/2 

(sin0)^/^|Pj(cos(6*))| < f — j (i-b 1/2)“^/^, O<0< 


vr. 


(5.5) 


Substituting </>j and writing t = cos(6*) immediately gives (5.3). Note that (5.5) was first proved 
by Bernstein with the factor instead of (i -|- 1/2)“^/^ on the right-hand side (see |38[ Thm. 

7.3.3]). This weaker inequality has been used several times in the analysis of CS for function 
approximation |23[ 136) . The sharper bound (5.5) allows one to obtain stronger results in the 
high-dimensional setting for the case of Legendre polynomials with sampling from the Chebyshev 
measure. See Corollary |7.7[ 


In order to formulate (4.2), we also need to choose the finite index sets Ik- We shall consider the 
following three standard constructions. First, the tensor product index set 


iV = {/£<: |i|oo < , 

where |i|oo = max{ii,..., i^}. Note that \I'k^\ = {K + 1)'^- Although this indexing is arguably 
the simplest, for moderate d the cardinality of is often too large for computations. A common 
alternative is the so-called total degree space 

= E<: |z|i < a:}. 
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where |z|i = ii +.. . + id [32]. Note that |/™| = ^ 
hyperbolic cross space 


K + d 
d 


. We shall also consider the (isotropic) 


G < : n(z, + 1)<K 

The exact cardinality of this space is harder to quantify, but it is known to satisfy the upper bound 


\Ik^\ < min 


in|2K34'^,e2K2+'°g2('^)| . 


(5.6) 


The first inequality is due to m Thm. 3.7] (using parameters T = K,s = d, a = l and 5 = 1/2), 
and the second follows from the proof of Theorem 4.9 in |25j . See also 


At this point, we stress that our main result (Theorem 6.1) is general, and applies to arbitrary 


function systems. We consider Legendre and Chebyshev polynomials with the above samplings 
since they are popular examples in the literature. But other cases could also be considered within 
our framework; for example, Jacobi polynomials. Our framework and theoretical guarantees also 
allow for nonpolynomial systems, e.g. spherical harmonics or piecewise polynomials. 


6 Main results 

We now present our main results. For this, we now introduce the notation A > B or A < B to 
mean that there exists a constant C > 0 independent of all relevant parameters such that A > CB 
or A < CB respectively. In particular, the constant C is independent of the weights used. 


6.1 General recovery guarantee 


To state our results we require two quantities. First, for weights w = {wi}i^i and u = {ujjig/ and 
a finite set A C /, we define 

A4(A;u,re) = |A|^i + max max{|A|^, 1}. (6.1) 

This quantity will play a crucial role in our estimates for the number of measurements required. 
Second, for weights w = {wi}i^i and x G ^i,(7), we define 

Tk,w{x) = min{||x - : x G Px(^i(7)), \\UPkx - y\\ < Vj] . (6.2) 


Loosely speaking, this term determines the additional error incurred due to truncation; that is, in 
solving the computable minimization problem (4.2) rather than (4.1). 


Theorem 6.1. Let AT G N, 0 < e < e“^, w = {wi}i^i be weights, x G and A C A 7 ^ 0, 

be any set with minjg 7 ^\^{tCi} > 1. Let ti,...,tm be drawn independently from the measure y. 
Then, for all minimizers x of (4-^ we have 


x-x|| < \^/\K\^ {rj!y/m + \\x - PkxWi^u) + \\x - Pax\\i,w + Tk,w{x), 


(6.3) 


with probability at least 1 — e, provided 

m > A1(A; n, w) ■ log(e“^) • 

where N = \Ik\, u = and M.{A;u,w) 

_l_ \/iogh~B 

log(^2Af-y/max{|A|u,,l}^ 


log (^ 2 A max I v 1 a|/), i|^ , 

are as in (3.3) and (6.1) respectively. 


(6.4) 
and A = 
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We discuss the consequences of this theorem in detail in the following subsections. However, 


let us first note that (6.3) differs from more standard error bounds in CS in two respects, both of 


which arise from the truncation. The first is the term \\x — Pkx\\i^u and the second is the truncation 
term Tk,w{x)- Whilst the first term is small for all sufficiently large K, the second term needs to 


be estimated. In fact, and much related to this issue, it cannot be taken for granted that (4.2) even 


has a solution for all K, since Ux may not lie in the range of UPk- This is only true in general if 
X is supported on Ik, which is typically not the case in practice. 

Fortunately, both issues can be readily addressed: 

Theorem 6.2 (PP). For all sufficiently large K, we have Ran([/) = Ran([/Pft'). In particular, 


{4-2) has a solution for all large K. Moreover, suppose that rank([/) = r <m and K is sufficiently 
large so that rank(C/P/^) = r. If x € ^}i,{I) then 


Tk,w{x) < (1 + \\PKw\\/(^r) Ik - Pkx\\i^^ 


where Or is the singular value of UPk- 

These results imply the following. Provided K is chosen sufficiently large such that l/ur is small 
and finite, then the truncated problem (4.2) not only has a solution but the additional error Tk,w{x) 
due to truncation is bounded by the tail error ||x — Pkx\\i,w multiplied by ||Px'u;||. Crucially, the 
condition l/ur < oo is independent of x (and therefore /) and depends only on the data sample 
points {ti}fLi and the system {(/>j}ie 7 . It can also be easily checked numerically. For theoretical 
guarantees relating K to the number of samples m to ensure this condition, we refer to [T]. 

Remark 6.3 Note that the term ||Pft-t(;|||k ~ Pkx\\i,w —)• 0 as iF —)> oo given some additional 
summability of the coefficients x. For example, suppose that I = U,, Ik = and the 

weights Wi are nondecreasing. Then it is straightforward to see that UPr-ujIIIIx — Pkx\\i,w < 
||x — Pkx\\i^w, where Wi = Viuif. Hence this term tends to zero as R' —)■ oo provided x G 


Remark 6.4 In the language of CS, Theorem 6.1 in an example of a nonuniform recovery guarantee 
|18j . For uniform guarantees, we refer to HaEZ!. Note that the guarantees in HaEZ! allow the 
error to be estimated in the stronger norm £^-norm, whereas in (6.3) uses the weaker £^-norm. 

On the 


This is a standard discrepancy between uniform and nonuniform-based CS analyses 
other hand, nonuniform recovery arguments are more flexible, in the sense that they can be used 
to derive recovery guarantees for arbitrary support sets A without specifying a sparsity model (e.g. 
sparsity or weighted sparsity). See [HI [13] for related work in this direction. It is this flexibility 


that allows us to derive the bound (6.4). As is also typical, our nonuniform guarantee (6.4) involves 


fewer log factors than corresponding uniform guarantees. 

Remark 6.5 The considerations of the previous remark aside, the conditions of Theorem |6 .1 1 are 
also more general than those of na ETj. Specifically, we do not require the weights to satisfy 
Wi > Ui m or Wi = Ui m, and we do not impose a priori estimates on the expansion tail (recall 


the discussion in {2.2 and 12.5). 


6.2 A weighted sparsity recovery guarantee 


In the absence of noise, Theorem 6.1 states that x is recovered up to an error proportional |k — 
Ra2;||i,io, i.e. the norm of the coefficients of x lying outside A, provided the number of measurements 
is, up to log factors, proportional to 

AI(A; ti, re) = I Ak-|- max max{| A| 

*6/k\A 


!}• 
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This bound may at first sight appear rather obscure, since it does not depend solely on the 
(weighted) sparsity of x. However, the generality of this bound will be useful in subsequ ent sections 
to analyze the performance of weighted minimization in different scenarios (recall 12.5). First, 


though, we note that Theorem 6.1 immediately implies a weighted sparsity recovery guarantee, 
similar to that introduced in I37j: 

Corollary 6.6. Let w = {wi\i^i he weights satisfying Wi > Ui, Vi G I, where u = {ui}i^i is as in 
(3.3) with fJ. = n, i.e. Ui = . Let 0 < e < e“^, /V G N, x G and suppose that ti,... ,tm 


are drawn independently from the measure p.. Suppose that 

m>s- log(e“^) • log(2A^\/s), 

where N = \Ik\- Then, for all minimizers x of we have 


(6.5) 


— .tII < 


\x — X 


X\fs {r]ly/rn+ ||x - Pkx\\i,u) + o-s,i^(a:)i,u, + Tk,w{x), 


\/log(g 


with probability at least 1 — e, where A = 1 + jQg^ 2 Arys) ^ {6.2) and 


<7s,i4'(a;)i,to = min{||x - Pax\\i,w : A C |A|.u, < s} , 
is the best weighted s-sparse approximation error. 


( 6 . 6 ) 


Proof. Let A C |A|^ < s be such that \\x — PAa^||i,«) = (Xs,k{x)i,w Since Wi > Ui we have 
I A|u < I A|^ and A1(A; u, w) < 2| A|^. The result now follows immediately from Theorem 6.1 □ 


This result shows that (4.2) attains the best weighted nonlinear approximation error crs,K{x)i^w, 
up to a constant, using a number of measurements m scaling linearly with s. Note that Corollary 


6.6 is similar to results found in m, except for the differences mentioned in Remarks |6.4| and |6.5| 


Remark 6.7 The reader will have noticed that (6.6) is not the true best weighted s-sparse ap¬ 


proximation error, but rather the best weighted s-sparse approximation error up to some finite 


range Ik- Since coefficients outside Ik do not form part of the optimization problem (4.2), this 


definition is natural (in fact, it has been implicitly assumed in all prior theoretical analysis of CS 
for function approximation). More fundamentally, one cannot expect to stably recover arbitrary 
s-sparse vectors whose coefficients can range over the whole of I (a countable index set) when 
taking only a finite number of samples. This lack of instance optimality in infinite dimensions is 
discussed in [3 El- 

Having said this, suppose that the weights Wi are increasing in the sense that min^g/^/^ltCj} 
oo as iV —>■ oo. Then this condition can be removed since sets of weighted cardinality |A|i„ < s 
cannot have arbitrary range. In particular, if K is chosen so that minjg/^T-^Ircj} > ^/s then one 
can replace (Ts,k{x)i^w with the true best weighted s-sparse approximation error 


^■^(x)!,^ = min{||x - Pa||i,' 


AC/, 


|A|^ < s} . 


7 Consequences of Theorem 6.1 


In this section, we discuss the main consequences of Theorem 6.1 as listed in 12.5 
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7.1 Recovery guarantees for linear models 


As discussed in %2A\ it may be the case that the most significant coefficients of a given function are 
located at the lowest indices with respect to some ordering. Suppose for simplicity that A = {j : 
Xj 7 ^ 0} = Im for some M. Then, according to the weighted sparsity recovery guarantee (Corollary 


6.6), the number of measurements needed to recover x is, up to log factors, proportional to 


\im\w = Wj- 
i&lM 

However, this condition depends on the optimization weights w and deteriorates as they increase. 
Conversely, the following is a straightforward corollary of Theorem 6.1 gives a sharper estimate: 

Corollary 7.1. Let w = {wi}i^i he any weights satisfying 

R^, R^oo, (7.1) 


maxjrcj/mj 

i&lR 


inf {wi/ui} 
i&l\lR 


for some v > 0, where u = {ui\i^i is as in (3.3). Let 0<e<e xG and suppose 

that ti,... ,tm cii"e drawn independently from the measure pL. If Lm LL Ik, ^ 


m 


> 


\Im\u ■ fog(e • (log {^N^\Im\v^ + zzlog(M)^ , 


then, for any minimizer x of we have 

{rj/Vm + ||x - Pkx\\i,u) + Ik - Pmx\\i,w + Tk,w{x), 

-y/log(E-l) 


X — X 


(7.2) 


(7.3) 


with probability at least 1 — e, where A = 1 + 


log(2Nyi\lM\n,} 


Proof. We use Theorem 6.1 with A = Im- Observe that \Im\w k 

max {u‘^/wi}max{\lM\wA} < max {M^’'\Im\u A} < \Im\u, 

where in the final step we use the fact that m > 1, Vf. The result now follows immediately. 


□ 


Observe that (7.2) depends only on the intrinsic weights u and is independent of the optimization 
weights w, provided these weights satisfy (7.1). Loosely speaking, this means that the WiS must 
grow at least as fast as the ufs. 


Example 7.2 Let d = 1, / = N and Im = {I, ■ ■ ■ ,IPI} for M G N. As mentioned in 12.4 


an 


oscillatory function (with frequency of oscillation O (M)) typically has x* = 0(1) for i = 1,..., M 
and Xj ~ 0 for i > M. Hence a good approximation of such a function occurs only if the first M 
coefficients are accurately recovered. Suppose now that the weights Wi = P for some a > 0. Then 
according to Corollary 6.6 the number of measurements needed is roughly Thus, more 


measurements are apparently required for more rapidly growing weights, at odds with the results 
shown in Fig. Suppose now that the intrinsic weights as z —>• oo for some /3 > 0. If a > /3 

then Corollary (with V = a — /3) gives that the number of measurements is proportional (up 
to log factors) to |/mU ^ regardless of a. 

In the case of univariate Chebyshev or Legendre polynomials with sampling from the Chebyshev 
measure - in which case the ufs are uniformly bounded (see ^ and therefore /3 = 0 - this result 
gives a linear scaling of m with M, up to log factors, regardless of the choice of a. Similarly, 
for Legendre polynomials with sampling from the uniform measure (in which case /3 = 1/2), one 
deduces a quadratic scaling of m with M. Both results are essentially optimal; see Remark |2.2[ 
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LU, d = 2 


CC,d = 3 


LC,d = 4 






f{t) = exp(2ti) cos(3t2) 


f{t) = sin(0.5exp(tit2t3)) 



Figure 2: The error ||/ — /||l=° (averaged over 50 trials) against m for Chebyshev (C) or Legendre (L) 
polynomials with points drawn from the Chebyshev (C) or uniform (U) measure. A total degree index 
set of degree K was used, where {d,K) = (2,44), (3,17), (4,10), and the weights were taken to be either 
Wi = ni=i(*i + 1)°" (top row) or Wi = (|i|i + 1)“ (bottom row) for various a > 0. 


Corollary 7.1 can also be used to assert similar results in higher dimensions. For example, 
suppose that the coefficients Xi satisfy 


|xi| X ]^(ij + 1) 

i=i 

for some fd > 0, as is reasonable in some cases m- Then the significant coefficients lie in a 
hyperbolic cross Im = Im^■ Suppose now that the weights Wi are chosen as Wi = Ui + !)“• 


Then (7.1) holds with v = a, leading to a measurement condition proportional to As in 

Example 7.2, this is independent of the parameter a. Similarly, if the coefficients \xi\ x for 

some p > 1 then one may take I m = to be a total degree index set. If the weights are chosen as 

gives a measurement condition proportional to 


Wi = Ui{\i\i + 1)°^ then Corollary 
is once more independent of a. 


7.1 


llfU, which 


Numerical illustrations of these results are given in Fig. (the univariate case) and Fig. (the 
multivariate case). 


7.2 The choice 1 < tc, < u. 


In this and the following two subsections we turn our attention to using Theorem 6.1 to understand 


the benefits that weights convey, as opposed to just showing that they lead to no deterioration in 
the recovery guarantee. We commence with the following straightforward result: 
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Proposition 7.3. Let Wi = (w*)® for some 0 < 0 < 1. Then 

|A|tj + max{Mi/uif}|A|^ < |A|u + max{i(j^}|A|. 

i&Ik i&Ik 

Proof. We have maxig/j^lttf/iaf}|A|^ < maxjg/^jti^^^”'’^} maxig 7 j^{tt|®}|A| = maxjg/^{Mf}|A|, as 
required. □ 

This result shows that choosing weights Wi scaling with the ufs cannot worsen the recovery 
guarantee (except possibly in the log factor) over that of the unweighted (0 = 0) case. Of particular 
interest is the extreme case 9 = 1, in which case we have 

I A|„ + max{Uj^/t(;f}| A|^„ = 2|A|„, Wi = Ui. 

We expect this to be significantly smaller than the corresponding estimate for the unweighted case 

|A|„ + max{ui/u;f}|A|^„ = | A|„ + max{Uj^}|A|, Wi = 1, 
i&Ik i&Ik 

whenever support set A does not contain too many high indices, so that the weighted cardinality 
|A|.u is not too large in comparison to the maximum of the ufs over the range Ik- In the next 
subsection we will see several concrete examples of this in the case of polynomial approximation. 


7.3 Recovery guarantees for tensor Chebyshev and Legendre expansions 

We now specialize our focus to the case of tensor Chebyshev and Legendre polynomial expansions. 
Following on from the previous subsection, we consider the cases Wi = 1 and Wi = Ui respectively. 
Throughout we let I = Nq and for the truncated spaces we consider total degree spaces Ik = 

It will also be useful to first recall the definition of a lower set: 

Definition 7.4. A set A C Nq is lower if whenever i = (ii,..., i^) G A and i' = (i'^,..., i'^) G Nq 
satisfies i'- < ij, j = 1,..., d, then i' G A. 

Lower (or sometimes referred to as downwards closed) sets are well-known constructions in mul¬ 
tivariate polynomial approximation, since in practice that the support sets of polynomial coefficients 
are often described by such sets. See mi Ea na iia ESI SO] and references therein. 


Tensor Chebyshev polynomials, random sampling from the Chebyshev measure. 

Corollary 7.5. Let v{t) = p{t) = 0^=1 • Then, for any A T Ik with |A| < s we have 

>f(A;u,l) < (7.4) 

provided Ik = 7™ is the total degree index set. If A Q Ik is also a lower set then 

M(A;u,u) < 2s'°§(3)/iog(2)^ (7 5) 


regardless of the choice of Ik- In other words, lower sets of cardinality s can be recovered via 
weighted minimization with weights Wi = from a number of measurements that is indepen¬ 

dent of d for large d and proportional to s*°g(3)/iog(2)_ 


Proof. The weights Ui are given by (5.1) in this case. Since A is a subset of the total degree space, 
we have |i|o < va.m.{d,K} for i G A. The first result now follows from (5.1) and the definition of 
M.. For the second we note that XlieA 2^*^° < |for any lower set m- □ 
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Figure 3: The error ||/ —/||l°° (averaged over 50 trials) against m for/(t) = exp{—{ti + ... + td)/{2d)) and 
Chebyshev polynomials with points drawn from the Chebyshev measure. A total degree index set of degree 
K was used, where {d, K) = (3, 24), (5,10), (10, 5). The weights were taken to be Wi = {ui)°‘ for various a, 
where Ui is as in (5.1). 


This result demonstrates the advantage of setting the weights Wi = Ui. In high dimensions, one 
can recover lower sets of coefficients nsing a number of measurements that is (up to log factors) 
independent of the dimension. Note that the scaling log(3) / log(2) is sharp in the sense that it 
agrees with the best known estimates for recovering a fixed, known lower set via discrete least- 
squares m- A numerical illustration of this result is given in Fig. In all dimensions, setting 
the weights as Wi = (ui)^ for some a > 0 leads to a smaller approximation error, with the choice 
Wi = Ui giving amongst the smallest. This is in good agreement with the above corollary. 

We remark that (7.4) was first obtained in [33] (see also |23|) and (7.5) has also been presented 
in |12] . Although our main condition is the same, our analysis improves on both resnlts by removing 
the requirement for a priori tail estimates (recall ^2.2[ ). As discussed, our recovery guarantees also 
exhibit fewer log factors (see Remark |6.4|). 


Tensor Legendre polynomials, random sampling from the uniform measure. 

Corollary 7.6. Let v{t) = ^(t) = 2~^. Then, for any A Q Ik with |A| < s we have 

M{A;u,l) <2x3^s, (7.6) 

provided Ik = 7™ is the total degree space of degree K. Conversely, if A T Ik is a lower set then 

M(A;u,u) <2s^, (7.7) 


regardless of the choice of Ik- 


Proof. Note that the weights Ui satisfy (5.2). For (7.6) we recall from 
whenever Ik is the total degree space. For (7.7), we recall that |A|„ < for lower sets m- 


that uf < 3^, i G Ik, 


□ 


As in the previous case, this result clearly illustrates the benefits of choosing weights Wi = Ui. 

We 


Note that (7.6) was first obtained in [33] (see also [23]) and (7.7) has also been given in 


remark in passing that the estimate (7.6) is sharp when d > K, but ceases to be sharp when d < K. 
For a better bound in this regime, see [23] . Also as in the previous setting, we note that the bound 


(7.7) is sharp in the sense it agrees with the best known estimates for recovery of a fixed lower set 


via 


discrete least squares [ni¬ 


ls 
































d = 3 d = 5 d = 10 


Figure 4: The error ||/ — f\\L°° (averaged over 50 trials) against m for f{t) = exp(—(ti + ... + td)/d) and 
Legendre polynomials with points drawn from the uniform measure. A total degree index set of degree K 
was used, where {d,K) = (3, 24), (5,10), (10, 5). The weights were taken to be Wi = {ui)°‘ for various a, 
where Ui is as in (5.2). 


Numerical verification of this result is given in Figure]^ It is worthwhile noting that a somewhat 
smaller error in this case can often be be achieved by taking larger weights of the form Wi = (ui)“ 
with a > 1. However, this effect decreases somewhat in higher dimensions. 


Tensor Legendre polynomials, random sampling from the Chebyshev measure. 

Corollary 7.7. Let v{t) = 2 ^ and fi{t) = ^ 2 ) 1/2 • Then, for any A C with |A| < s 

we have 


M(A; u,l)<2x ( 7 r/ 2 )‘^( 4 / 7 r)““^-^’'^>s, (7.8) 

provided Lk = 7™ is the total degree space of degree K. Conversely, if A T is a lower set then 

M(A;u,u) < , (7.9) 


regardless of the choice oflx- 


Proof. Recall that the weights satisfy (5.4) in this case. The first result follows immediately from 
this bound. For the second, we note first that 


\A\u < {tt/2YK{A), 


iF(A) = j;(4/7r)l'lT 

ieA 


We now claim that K{A) < |for any lower set, thus yielding (7.9). To establish 
this claim we shall adapt arguments given in [28]. We use induction on re = |A|. If re = 0 then 
A = {0} (since A is lower) and the claim trivially holds. Now assume the result holds for re and 
let A be lower with |A| = re + 1. Without loss of generality, H 7 ^ 0 for some i € A. Let J be the 
maximal value of ii for i = (zi,..., z^) G A and define the sets 


= {* = (*2,..., Zd) : (7, Z2,.. ., id) G a| C N'^ ^ 

Notice each A^ is a lower set and we have the inclusions Aj C Aj_i C • • • C Aq. Since J > 1 we 
also have |Afc| < |A| for any k, hence the induction hypothesis gives 


k=l 


,7 

K{A) = K{Ao) + A/7rY^K{Ak) < |Ao|^ + d/vr ^ |A 

k=l 


(7.10) 
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where j3 = log(l + 4/7r)/log(2) > 1. We now claim the following. For oq > ai > • • • > a„ > 0 it 
holds that 


(oq + . . . + ^ Oq “1“ + • • • + Q-n)- 


Proof of this claim follows identical steps to that of |28[ Lem. 2.3]. Returning to (7.10) 


r:(a) < 



lAlt 


as required. □ 

As in the previous examples, this proposition shows that setting the weights Wi = Ui improves 
the recovery guarantee for lower sets. To the best of our knowledge, this result has not appeared 
previously elsewhere. 


Remark 7.8 Besides the quantity A4(A; u, w), the main estimate (6.4) also involves the log factor 
log(2A^max{y^|A|^, 1}) (there is also a second log factor depending on the failure probability e, 
but this is independent of N and A and hence will not discussed further). In the unweighted case 
Wi = 1, since |A| = s < N this reduces to a log factor proportional to log(2A). Conversely, in the 
case Wi = Ui one has the log factor 


log(2A^-y/|A|„) = log (^N^/2M{A■,u,u)J . 


Corollaries 7.5-7.7 can therefore be used to estimate the right-hand side. In particular, if A is 


lower, then Corollaries 7.5 and 7.6 give a resulting log factor proportional to log(2A) for the CC 


and LU cases, since AI(A;tt,u) is polynomial in s independently of d in these cases, whereas for 


the LC case Corollary 7.7 gives a factor proportional to d -|- log(2A"). 


Remark 7.9 When A is lower and the weights are chosen as Wi = Ui, the estimates for A4(A; u, u) 


in Corollaries 7.5 7.7 are independent of the choice of truncated space Ik (provided A C Ik). This 


choice only affects the parameter N = \Ik\, which, as discussed in the previous remark, arises only 
as a log factor in the measurement condition. While we have used a total degree space in our 
numerical experiments for simplicity, a viable alternative (introduced in [12]) involves taking Ik to 
be the union of all lower sets of cardinality s. This is precisely the hyperbolic cross I^^ of order 
s. Estimates for N in this case are given in (|5.6|), and lead to a bound for the logarithmic factor of 


the form log(2A) < min{log(2s) -|- d, log(d) log(2s)}. 


7.4 Support estimation via weighted minimization 

We now turn our attention to a different use of weights: namely, to improve recovery performance 
when prior information about the support of x is available. As mentioned, a number of recent 
works have empirically demonstrated the benefits of this strategy in multivariate polynomial ap¬ 
proximation. In this section, we provide theoretical support to this work. 

To do this, we shall assume for simplicity that Ui = 1, Vi, although what follows extends to 
general Uj’s. Let A C / be the set of coefficients we wish to recover and suppose that T C / is an 
estimate for A based on prior information. In order to exploit this knowledge, we choose weights 


Wi = 


7 i G r 

I i ^ r ’ 


(7.11) 


20 
















where 0 < 7 < 1 is a fixed quantity based on the confidence of our estimate. Define scalars 


lAnri 


p = 


a = 


(7.12) 


|r| ’ " |A|’ 

and observe that /?, cr —)• 1 as the accuracy of T increases. Then: 

Corollary 7.10. Let A,r C |A| = s, and suppose that Ui = 1, Mi, and w = {wi}i^i is as in 
(7.11). Then, if Ai is as in (6.1), we have 

M{A-l,l) = 2s, 


and 


M{A U T; l,w) = (2 + cf{1 + 7 - 2p)) s, 


where p and a are as in (7.12). In particular, if 


P > 


1+7 

2 


then 

7W(Aur;l,u;) < M(A;1,1). 

Proof. Since Wi = 1 for z ^ A U T we have 

^(Aur;!,^) = |Aur| + |Aur|^ 

= |r| + 2|A\r| + 7 |r| 

= |r| + 2|A| -2|Anr| + 7 |r| 

= (cT + 2 — 2pa + 7(t) s 
= (2 + cr(l + 7 - 2 p))s, 

as required. □ 

This result implies the following. If 7 is sufficiently small and if over half of the support set A 
is correctly guessed, then the above weighting strategy leads to a smaller measurement condition 
than in the unweighted case. In other words, weighting based on sufficiently good prior coefficient 
estimates can reduce the number of measurements required. 


Remark 7.11 Weighted minimization with prior support information has been been explored 
in a number of works [3 da He]. The setup we consider above is based on that of Friedlander et 


al. |19j . In most prior works, the measurements are usually taken to be of random Gaussian type, 
which leads to stronger guarantees than ours. We are aware of no works that consider prior support 
information for random sampling of orthonormal systems of functions. In passing, we note that the 
improved recovery guarantee of Theorem 6.1 is critical to this analysis, since it allows for arbitrary 
weights Wi. 
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8 Proof of Theorem 6.1 


We now give the proof of Theorem 6.1 As is standard, we first renormalize the problem (4.2) as 


follows. Define the infinite matrix A 

1 

Am 


A = 


r , -'1 ”1,00 1 

\ (t>j{ti)\/v{ti)/^{ti) ^ 
f J i=l,j=l y/m 


( 8 . 1 ) 


Much like C/, A is a bounded operator from .^^(N) to C”* whenever the weights are as in (3.3) (recall 
Lemma 3.1). Note also that 


1 m „ 

E{{A*A)ij) = — dt = 6ij, 

^ k=l 


It follows that (4.2) is equivalent to 


mm 2 1 , 

zePif (£!,(/)) 


subject to \\APkz — y\\ < rj/y/m, 


i = j £ I. 


( 8 . 2 ) 


where y = Ax + e, x are the coefficients of / in the basis and e is a noise vector satisfying 

||e|| < rj/y/m. We consider this problem from now on. 

The proof now follows a similar route to prievious nonuniform recovery guarantees in CS, 
although with some significant modifications. We first show that Theorem 6.1 follows from the 
existence of a certain dual certificate (Lemma 8.1), and then construct the dual certificate using a 
variant of the golfing scheme of D. Gross [2T]. Technical lemmas required for this construction are 
presented in §8.2[ Our argument involves two key novelties. First, the handling of infinite tails - an 
issue that does not arise in most previously-considered (i.e. finite-dimensional) CS setups. Second, 
the additional complications, and correspondingly refined estimates leading to (6.4), due to the 
presence of the weights in the optimization problem. 


8.1 Dual certificate 

Lemma 8.1. Let w = {wi}i^i he positive weights and A C Ijy be such that minjg/^\A{w^i} > 1- 
Suppose that 

(f) : ||PaA*APa - Pa\\ < a, (H) ■ max {\\Aei\\/wi} < (3, 

and that there exists a vector p = W~^PkA*^ G Pk{£‘^{I)) for some ^ G C™, where W = 
diag(rci, rc 2 ,...), such that 

(Hi) :\\W{PAp-sign{PAx))\\ <-f, (in) : ||Pa/o||oo < 6 *, (v) : ||C|| < Av^|A|^, 

for constants 0 < a, 0 < 1 and /3, 7 , A > 0 satisfying < 1- Aet x G (^^{ 1 ), y = Ax -|- e with 

||e|| <7 and suppose that x is a minimizer of the problem 

min || 2 ;||i ^ subject to \\APkz — y\\ < p. 

If X £ Pk{Iw{I)) is feasible for this problem, i.e. \\APkx — y|| < p, then the estimate 

\\x - x|| < (^Ci -h C 2 X-\/\A\w'^ {2p + \\x - Pkx\\i,u) + C 2 (2||x - Pax\\i,w + ||a; - ^||i,-u)), (8.3) 

holds, where Ci = (l + Cq, C 2 = (l + Co + jhe and Cq = (l - 
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Proof. \.eiv = x- Prx G Pk{(.‘^{I)). Then PaA*APav = PaA* Av - PaA*AP^v. By {i) 


\\{PaA*APa)-1 < 


1 — a ’ 


and 

Thus 


llPA^Ir = P^Alr = \\PaA*APa\\ < l + a. 


IIPa^II < -^\\PaA*\\\\Av\\ + ^\\PAA*APiv\\ < 

1 — a 1 — a 1 — a 




Observe that ||^f || = \\Ax — APkx\\ <2r] + ||A(x — Pkx)\\, and therefore by Lemma 3.1 

IIAnll <2r] + ||x - Pkx\\i,u- 

Hence 


(8.4) 


\Pav\\ < 


\/l + a 


( 27 ] + ||x - Pkx\\i,u + \\APav 


1 — a 

The third term can be estimated as follows: 

\\APav\\ < ^ < P\\Pav\\i,v 

i^A 

where the latter inequality is due to (ii). Hence we get 

VT^Pa 


IT’a^^II < (2^ + - Pkx\\i,u + PWPavWi,!^^ ■ 


(8.5) 


We shall return to this inequality later, but let us now consider x. 

= ||i^Ax||i^^ + ||Pa^IIi,«, 

> Re(PAbL®,sign(PAa:)) + ||-Pa^^IIi,«; “ II^Aa^lli,^ 

= Re(PAfBt^,sign(PAa;)) + ll-PAa:|li,^ + II^a^^IIi,^ “ II^Aa^lli,,^ 

= Re(PAfRt’,sign(PAx)) + ||x||i_^ + “ 2 ||Pa^IIi,«,- (8 -6) 

Now let X G be feasible. Then ||x||i^.u, < ||x||i^^ and we get 

||x||l,^„ > Re(PAVL?;,sign(PAa;)) + + ||Pa^^IIi,«; “ ‘^\\Pax\\i^w, 

which after rearranging gives 

||Pa^IIi,«) < |(PAVLn,sign(PAa:))| + ‘2\\Pax\\i,w + Ik - x||i,^. (8.7) 

We next estimate |(PAbL^^jsign(PAa^))|■ We have 

|(PAVLn,sign(PAx))| < |(PaW?;, sign(PAx) - Pap)| + \{Wv,p)\ + |(Pa Wt>,PA/o)|. 

Note that 

|(PAlTn,sign(PAx) - Pap) \ < Tll^A'f^ll, 
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and also that 


{Wv,p) = {v,Wp) = {v,A*0 = {Av,p). 


Hence, (8.4) and (v) give 

\{Wv,p)\ < ||Hn||AVs < (2r/ + ||x - Pkx\\i^u)>^V\A^\w- 


Finally, by (iv), we have 


\{P^Wv,P^p)\ < ||/AP||oo||Hi;n||i,^ < 0\\Pav\\i,w 


Hence 


|(PAlHt',sign(PAx))| < 7 ||-Pa?^|| + (2?? + ||x - Pkx\\i,u)X'/\^\w + ^ll-PA'^IIdu)) 
and substitnting into ( |8.7| ) and rearranging yields 

(1 - 6')||Pa^^IIi,«^ < tII^a^^II + (2r/ + ||x - Pkx\\i^u)W\^\w + ^PAxh^w + Ik - 

Applying (|8.5[) now gives 


II^A^;|| < 


kiT 


a 


1 — a 


2p + ||x - PKa;||i,« 

13 


+ 


1 - 9 


(7ll^At’|| + (2r? + ||x - Pkx\\i^u)>^V\A^\w + 2||PAa;||i,-u; + Ik - a:||i,«;) 


and therefore 


ll^’At^ll <[3- 

Pi- 


pi + a/37 A ^ pl + a 


{1 — a){l — 9) J 1 — a 
pi + q :/37 \ ^ pi + aP 


1 + ^j-^A-v/|Ak^ {2p + Ik - Pkx\\i,u) 


= Co + Y^Ay^lAk^ {2p + Ik - Pxa;||i,«) + (2||PAa:||i,«> + Ik “ ®lli,t«) ■ 

Since Wi > 1, i £ Ik\P; we have ||Pa^II — II-Pa^IIi — ll'f’A^lli>«'> hence 

Ikll <||f^A?^|| + ||^A^IIl,«) 

< + YZ~0^ II^At'll + + II® “ -fka;||i,„) Ay^Ajk + yZTq (^H-^Aa^llpto + Ik “ ^lli,«)) 


< 


1 + . ^ ^ ) CIq ( 1 + 1 ^ fl-^'klAk^ + --^Ayk^ 


+ 


1-9 
Co /3 


7 


1 + ^. + 


1-9 \ 1-9 1- 


1-9 
1 


1-9 

(^2||pka;||i,«> + Ik - x\\i,w I, 


(2p + Ik - Pkx\\i,u) 


as required. 


□ 
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8.2 Technical lemmas 


For the dual certificate construction in 18.3 we first require the following four technical lemmas: 


Lemma 8.2. Let A C 1 < |A| < oo, 0<e<l, 5>0 and u = {ui]i^i be as in (3.3). Then 

\\PaA*APa-Pa\\ < b, 

with probability at least 1 — e, provided 

m > |A|„ • log (2|A|/e) • {26-^ + 25-^/i) . 

Proof. Let Yi = {^/^{ti)/p,{ti)(l)j a and observe that 

m 

PAA*APA-PA = Y.Xi, 

i=l 

where Xi = P {YiY* — I) £ satisfies E(Xj) = 0. Note that 

11^*11" = E = 1^1- 

ieA j&A 


Hence 


\Xi\\= sup \{XiX,x)\=m sup ||(yi,x)F - 1| < m |A|„, 

||ir||=l ||a:|| = l 


where in the last step we use the fact that |A|„ > |A| > 1. Also 

E{Xf) = m-^E (dlTif - 2)YiY* + l) 

Since E{YiY*) = / we have 


\{E{Xf)x,x)\<m ^|A|„||xf, 


and therefore 




2=1 


= sup 

||ai||=l 


J;(E(X2)x,: 


2 = 1 


< m ^|A 


The result now follows immediately from the matrix Bernstein inequality [181 Cor. 8.15]. 


□ 


Lemma 8.3. Let A C /, 1 < |A| < oo, 0<e<e <5>0, tt = {ui}i^i he as in \3.^ and 

z G Then 

\\{PaA*APa-Pa)z\\<6\\zI 
with probability at least 1 — e, provided 

m > |A|u • log(e“^) • (8(5“^ + 28(5“^/3) . 

Proof. Let || 2 ;|| = 1 without loss of generality. As in the previous proof, write PaA* APa — Pa = 


\\{PaA*APa-Pa)z\\ = 


E^- 
2 = 1 
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where Zi = m ^{YiY* — I)z. Observe that E(Zj) = 0 and the Zi are independent copies of a single 
random vector. Argning as in the previous lemma, we note that 


\Zi\\ < m ^|A|„, 


and 


^\\Zi\f<m ^max{|A|u, 1}, sup E|(Zj,x)f <m ^|A|„. 


Suppose that m > 4|A|.uh ^ so that -ym]E||Zj|p < 5/2. It now follows from [181 Cor. 8.45] that 

52/8 


' {\\{PaA*AP/\ - Pa)z\\ > 5) < exp - 


m 


max{|A|„, 1} 1 + 75/6 


which gives the result. 


□ 


Lemma 8.4. Let 0<e<l, 5>0, w = {wi\i^i he weights, A C and suppose that 
> 1. Then 

max {IIAeill/tCi} < Vl + 5, (8.8) 

*6/k\A 

with probability at least 1 — e, provided 

m>2 max {u//wf} ■ log {2N/e) ■ (5~‘^ + (5~^/3) , 

i&lK\A 


where N = \Ik\ and u = {ttjjjg/ is as in (3.3). 
Proof. Fix i G /j^\A. Then 


|2 /„..2 _ / t * /„„2 _ 1 ^(* j ) 


\\Aei\\^/wf = e*A*Aei/wf = 


m ^ Ktj) 


m 


m 


+ < 


i=i 


j=i 


+ 1) 


where Xj = m ^ ~ /'O’f. Note that E(Aj) = 0 and the A^-’s are independent. 

Moreover, 

\Xj\ < m-\j/wf, 


and 


i=i 


mw. 

1 


Jd \h{t) ) 


yjf) ia{t)‘^ 


\(j)i{t)YiJ.{t) dt-1 


< 


U7 


mwj 


Hence, by Bernstein’s inequality, 

¥{\\Aei\\/wi > \/l + (5) < 2exp ^ ^k\7Y, 

where k = m~^ maxjg 7 ^\^{u^/t(;?}. The result now follows from the union bound. 


□ 
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Lemma 8.5. Let 0 <€<1,6>0, z€ w = {wi}i^i he weights and A O Then 

\\PKPiW-^A*AP^z\\oc<6\\z\\. 
with probability at least 1 — e, provided 

m > 2 ( max {uf/wf}S~^ + y^|A| u max {ui/wi}5~^/^\ ■ log(2A/e), 

je/x\A J 


where N = \Ik\ and u = {itjjjg/ is as in (3.3). 

Proof. Let || 2 ;|| = 1 without loss of generality. Observe that 

r-l A* 


\\PkP^W-^A*AP^z\\oo= max 

ie / K\A Wi 


Fix i £ Ik\A. Then 


\{ei, A* APaz)\ 


Wi 


i=i 


where Xj is the random variable 


X, = 


v6i 


Observe that the Xj are independent and lE(Xj) = 0 since i ^ A. Also 

I u{t 


< 


Ui 


Wim 


E 

fceA 


p.{tj 




< 


Ui 


Wim 


■x/l 


and 


i=i 


m 

Wimj 


E 4’k{t)Zk 

fceA 


^{t) dt = 


u7 


mwt 


Therefore, by Bernstein’s inequality and the union bound, 
\PkPaW-^A*APaz\\oo > 5) < 21Vexp 
The result now follows immediately. 


mJ^/2 


(8.9) 


^^^ielK\A{Ui/wf} + A|„ maXjg7^\A{«iM}/3^ 

□ 


8.3 Construction of the dual certificate p 


We are now ready to construct a dual certificate p satisfying the conditions of Lemma 8.1, For 
parameters, we choose the following values: 


a 


= 1/4, /3 = ^, 7 = 1/8, 0 = 1/2. 


( 8 . 10 ) 


Set up. Let L G N and suppose that mi,..., are such that mi + ... + m^ = m. 
in (3.4), then write G £_mixoo £qj. submatrix of the hrst mi rows, 
submatrix of the hrst m 2 rows, and so on. Dehne p^^^ = 0, 


If U is as 
for the 


pW = mf^W-^PK{U^^'’)*U^^^PAW (^sign{PA{x)) - PaP^^~^'^) + 1 = 1,2. 
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and 

z;(0 = W (sign(PA(x)) - ^ = 0,1, 2. 

Let 0^^^ = {1} and 0^^^ = {1, 2}. For I = 3,..., L, we define 0^^^ as follows; 
• If 


\\mr^PKPAW-\U^^^yU^^^PAV^^-^^\\^ < 

for constants ai and bi then set 0^*^ = 0(^“^) U {?} and 

pW = + p('-i), n(') = W (^sign(PAa;) - PaP^'^) • 

• Otherwise, set 0^*) = p^^^ = and 

We now define the events Ai,A2,Bi,B2,C,D as follows: 

Ai : ||(Pa -m-ipA(p('))*p(')PA)T^^^-^^|| < ^ = 1,2, 

Bi : ||m-^Pi^PflF-nP^'^)*P^'^PA^^^^“^^||oo < bi\\v^^-^^\\, / = 1,2, 

C: \Q^P\>R 
D : ||Pa^MPa-Pa|| < 1 / 4 , 

E : sup{||Aej||/u;i} < 

i^A 

F : Ai n A2 n Bi n B2 n C n D n E, 

where is the cardinality of 0*^^^. If event P occurs, write 0^^^ = {r(l), r(2),..., t(P), ...}, 

where the function r satisfies t{1) > I for all I, and define the dual certificate as p = 


Choice of the parameters. The idea of the proof is to choose P, P, mi ,... ,mL, oi,..., and 
61 ,..., 61 , so that conditions (i)-(v) of Lemma 8.1 are fulfilled for the parameter choices (8.10). We 
make the following choices for these parameters. Write s = |A|^, s* = max{s, 1} and set 


P = riog2(8W\/F)]. 


where N = \Ik\, 


L = 2+ [log(7€-i)l +10P 
1 


oi — 02 — 


2\/log2(8iV\/F) 


o-i = 1/2, I = 3,..., L. 


and 


6 .=(.= -L. , = 3,...,L. 


m ; Q r 

mi = m 2 = -r, 1^1 = —--—, I = 3,..., L. 


2{L-2y 


( 8 . 11 ) 

( 8 . 12 ) 

(8.13) 

(8.14) 
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Claim: Event D implies conditions (i)-(v). Suppose that event D occurs. Immediately, events D 
and E give that conditions (i) and (ii) hold with a = 1/4 and /3 = y^5/4. Now consider condition 
(hi). If r(fe — l),T{k) G 0^^^ then note that 


^(r{k)) ^ (^sign(PA(x)) - 


Hence 


k k 

i=i i=i 


^(^( 0)11 < a /.om 


Observe that 


n 

i=i 




1 

< 


2 Mog2(8A^\/^) “ 2^ 


(8.15) 


(8.16) 


(8.17) 


Hence setting /c = i? in (8.16) and noticing that 


W{P/\u — sign(PAa^)) = W — sign(PAa;)) = 


gives that 


R 


s 1 


||lH(PA«-sign(PAx))|| < VsYlar(j) = ^ < - 


i=i 


(8.18) 


Thus condition (hi) holds with 7 = 1/8 as required. Now consider condition (iv). Observe that 

Therefore 

||p^p(^(0)||o^ < + ||TaP^^^^"^^^I|oo < Vs&r(fc) n 

1=1 

where we use the convention that 0^=1 “r(j) = 1 when k = 1. Hence 


R k-l 

ITauIIoo ^ ^ l^rjk) _J_ ^T(j) • 

k=l j=l 


(8.19) 


Substituting the values of a/ and 6 / into the right-hand side of (8.19) and using (8.17) gives 

v5EM.)n«.«)<)(‘+)+)+5+-+v/T)4- 


k=l j=l 


Hence condition (iv) holds with 6 = as required. 

Finally consider condition (v). Write = W~^PkA *where 


^('r(fc)) _ V jj{T(k))^(T(k-l)) _|_ ^(^(fc-l))^ 

^T(k) 
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It follows that 


( 8 . 20 ) 


^T{k) 

Consider the first term on the right-hand side. We have 


r(fc)' 


where in the middle step we use (8.15). By (8.16), it now follows that 


' k-l 


^ ® (ar(fc) + 1) I n “-(i) I ’ 

vi=i 


and therefore, returning to (8.20) and summing over A; = 1,..., ii, we get 

R , _ k—1 


lien 


m 






(k) 


in 


a. 


k=l V '■(^1 j=l 

where ^ is such that u = W~^PkA*^. Now notice that 


O’ 


m 


m. 


(k) 


^r(Ai) 


+ 1<Vq, A: = 1,2, 


and 


m 


m. 


T{k) 


r{k) 


+ 1<V^ k = 3,...,R. 


It follows from (8.21) and (8.17) that 


( 8 . 21 ) 


ll'fi’ll < Vs 

and therefore condition (v) holds with 

A< 1 + 


|'v6(l + l/2) +V3L 
V fc=3 


^ 2^1og2(8A^\/^) I 


). 


\/log (7 


-i') 


log(2iVy^max{|A|^, 1}) 


( 8 . 22 ) 


8.4 Event F holds with high probability 

We now derive conditions on m for event F to hold with probability at least 1—e, where 0 < e < e“^. 
By the union bound, it suffices to prove that events ^i, ^ 2 , i?i, i? 2 , C, D, E occnr with probability 
at least 1 — 7 , where 7 = e/7. 


1.3 


Events Ai , A 2 and i?i, i ?2 • We apply Lemma 
mi = m/4. This gives that IP(^/) < 7 for / = 1, 2 provided m satisfies 


with e = 7 , (5 = l/( 2 -yiog 2 ( 8 iV-v/^)) and m = 


m>|A|„-log (7 ^) • log 2 ( 8 AV^). 


(8.23) 
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Similarly, for events Bi and B 2 we apply Lemma 8.5 with e = 7 , <5 = l/(4-y/s) and m = mi = m/4. 
This gives that ^{Bf) < 7 for / = 1, 2, provided 


m > I |A| 


max 


* 6 /k\A 

After simplifying, we see that it suffices that 


{uf/wf} + y/\A[u\^\w max {ui/wi}] •log( 2 iV/ 7 ). 

ie/x\A 


( |A|„+ max {uj/wf}\A\^ ) •log( 2 A/ 7 ). 

*6/x\A 


(8.24) 


Event C. Define the random variables Ai,..., Xl -2 by 


Xi 


1 

0 otherwise 


so that F{C^) = P(|0(^)| <R)= P(Ai + ... + Xl -2 < Observe that 



Xi<R \ = E 


L-3 


Xl-2<R-}_^Xi\Xl_3, 
V 1=1 



When conditioned on an instance of Xi,... ,Xls the variable Xl -2 has a Bernoulli distribution 
with some parameter p{Xi,... ,Xl- 3 )- If A is a Bernoulli random variable with parameter p, then 
the function P(A < t) is a nonincreasing function of p for any fixed t G M. It follows that 


/L-2 \ / L-3 

E f ^ A; < < P f A/_2 + Y1^Xi<R 

where A /_2 is an independent Bernoulli random variable with parameter p' satisfying 

p< min p{xi,...,XL- 3 )- 



(8.25) 


We now wish to guarantee that p' > 9/10. Observe that Xl -2 = 0 if, for I = L, either of the 
following events occur: 


C2 : \\m^^PKPAW-\U^^y*U^^^PAV^^-^^\oo > bi\\v^^-^^ 


Applying Lemma 8.3 with m = mi, 6 = 1/2, e = 1/20 and Lemma 8.5 with m = m^,, 6 = 
log 2 ( 8 A\/^)/( 4 \/s) and e = 1/20 we now see that p' > 9/10, provided m satisfies 


rriL > |A|„, 


and 


mL> ( |A|^ max {ui/wf} + vlAjjAD/ max {uj/rcj} ) log(40A))/log 2 ( 8 A\/^). 

V i 6 /K\A ie/x\A J 

^In some of the first presentations of the golfing scheme [SHU]. it was assumed that these random variables were 
independent, which is not the case in general. This issue was fixed in via a more careful argument. Here we follow 
the approach of |22) . 
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Since ttil = m/( 2 (L — 2 )), these now reduce to 


• (log(e + logj 


and 


respectively. 


m 


> 


A|„+ max ) • log(2iV) • log(e ^), 

*6/k\A 


(8.26) 

(8.27) 


Assuming now that (8.26) and (8.27) hold, we have p' > 9/10. Moreover, repeating the same 
arguments we can iterate the estimate (8.25) to obtain 


^L-2 


rL-2 




. «=i 


J=i 


where the are independent Bernoulli random variables with parameter 9/10. Following the 
arguments of [ 2 ], we deduce that 


< exp (^- 2 (L - 2 ) 


. i=i 


R 

VlO 


By construction 


2(L - 2) 


9 R 

10 ~ L-2 


> 2 log (7 




9 _ 1 

10 ~ 10 


> log(7 )■ 


Hence 


/L-2 


IP(C'=) = P X; < < 7 , 

as required. 

Events D and E. For event D, we apply Lemma |8.2| with e = 7 and 5 = 1/4. This yields that if 

m > |A|^, •log( 2 |A|/ 7 ), (8.28) 

then P(Z1'^) < 7 . Similarly, suppose that 

m> max {u^/tcf} • log( 2 A/ 7 ), (8.29) 

igIrXA 


then Lemma 8.4 gives that P(£'‘^) < 7 . 

8.5 Proof of Theorem 16.11 


Recall that (4.2) is equivalent to (8.2). The estimate now follows from Lemma 8.1 provided 
conditions (i)-(v) hold. As demonstrated in the previous section, (i)-(v) hold with probability at 
least 1 — e and with A given by ( 8 . 22 ), provided (8.23), (8.24), (8.26), (8.27), (8.28) and (8.29) hold. 
However, due to the assumption on e and the choice of 7 , these are all implied by (6.4). 
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